Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
1.
Am J Hum Genet ; 110(4): 575-591, 2023 04 06.
Artículo en Inglés | MEDLINE | ID: mdl-37028392

RESUMEN

Leveraging linkage disequilibrium (LD) patterns as representative of population substructure enables the discovery of additive association signals in genome-wide association studies (GWASs). Standard GWASs are well-powered to interrogate additive models; however, new approaches are required for invesigating other modes of inheritance such as dominance and epistasis. Epistasis, or non-additive interaction between genes, exists across the genome but often goes undetected because of a lack of statistical power. Furthermore, the adoption of LD pruning as customary in standard GWASs excludes detection of sites that are in LD but might underlie the genetic architecture of complex traits. We hypothesize that uncovering long-range interactions between loci with strong LD due to epistatic selection can elucidate genetic mechanisms underlying common diseases. To investigate this hypothesis, we tested for associations between 23 common diseases and 5,625,845 epistatic SNP-SNP pairs (determined by Ohta's D statistics) in long-range LD (>0.25 cM). Across five disease phenotypes, we identified one significant and four near-significant associations that replicated in two large genotype-phenotype datasets (UK Biobank and eMERGE). The genes that were most likely involved in the replicated associations were (1) members of highly conserved gene families with complex roles in multiple pathways, (2) essential genes, and/or (3) genes that were associated in the literature with complex traits that display variable expressivity. These results support the highly pleiotropic and conserved nature of variants in long-range LD under epistatic selection. Our work supports the hypothesis that epistatic interactions regulate diverse clinical mechanisms and might especially be driving factors in conditions with a wide range of phenotypic outcomes.


Asunto(s)
Epistasis Genética , Estudio de Asociación del Genoma Completo , Desequilibrio de Ligamiento/genética , Genotipo , Bancos de Muestras Biológicas , Reino Unido , Polimorfismo de Nucleótido Simple/genética
2.
J Pers Med ; 12(12)2022 Nov 29.
Artículo en Inglés | MEDLINE | ID: mdl-36556195

RESUMEN

The Penn Medicine BioBank (PMBB) is an electronic health record (EHR)-linked biobank at the University of Pennsylvania (Penn Medicine). A large variety of health-related information, ranging from diagnosis codes to laboratory measurements, imaging data and lifestyle information, is integrated with genomic and biomarker data in the PMBB to facilitate discoveries and translational science. To date, 174,712 participants have been enrolled into the PMBB, including approximately 30% of participants of non-European ancestry, making it one of the most diverse medical biobanks. There is a median of seven years of longitudinal data in the EHR available on participants, who also consent to permission to recontact. Herein, we describe the operations and infrastructure of the PMBB, summarize the phenotypic architecture of the enrolled participants, and use body mass index (BMI) as a proof-of-concept quantitative phenotype for PheWAS, LabWAS, and GWAS. The major representation of African-American participants in the PMBB addresses the essential need to expand the diversity in genetic and translational research. There is a critical need for a "medical biobank consortium" to facilitate replication, increase power for rare phenotypes and variants, and promote harmonized collaboration to optimize the potential for biological discovery and precision medicine.

3.
Pharmacogenomics J ; 19(2): 178-190, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-29795408

RESUMEN

Identifying genetic variants associated with chemotherapeutic induced toxicity is an important step towards personalized treatment of cancer patients. However, annotating and interpreting the associated genetic variants remains challenging because each associated variant is a surrogate for many other variants in the same region. The issue is further complicated when investigating patterns of associated variants with multiple drugs. In this study, we used biological knowledge to annotate and compare genetic variants associated with cellular sensitivity to mechanistically distinct chemotherapeutic drugs, including platinating agents (cisplatin, carboplatin), capecitabine, cytarabine, and paclitaxel. The most significantly associated SNPs from genome wide association studies of cellular sensitivity to each drug in lymphoblastoid cell lines derived from populations of European (CEU) and African (YRI) descent were analyzed for their enrichment in biological pathways and processes. We annotated genetic variants using higher-level biological annotations in efforts to group variants into more interpretable biological modules. Using the higher-level annotations, we observed distinct biological modules associated with cell line populations as well as classes of chemotherapeutic drugs. We also integrated genetic variants and gene expression variables to build predictive models for chemotherapeutic drug cytotoxicity and prioritized the network models based on the enrichment of DNA regulatory data. Several biological annotations, often encompassing different SNPs, were replicated in independent datasets. By using biological knowledge and DNA regulatory information, we propose a novel approach for jointly analyzing genetic variants associated with multiple chemotherapeutic drugs.


Asunto(s)
Variación Genética/genética , Estudio de Asociación del Genoma Completo/métodos , Neoplasias/tratamiento farmacológico , Farmacogenética/métodos , Población Negra/genética , Capecitabina/efectos adversos , Capecitabina/uso terapéutico , Carboplatino/efectos adversos , Carboplatino/uso terapéutico , Línea Celular , Cisplatino/efectos adversos , Cisplatino/uso terapéutico , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Genoma Humano/genética , Humanos , Anotación de Secuencia Molecular , Neoplasias/genética , Paclitaxel/efectos adversos , Paclitaxel/uso terapéutico , Polimorfismo de Nucleótido Simple/genética , Población Blanca/genética
4.
Nat Commun ; 8(1): 1167, 2017 10 27.
Artículo en Inglés | MEDLINE | ID: mdl-29079728

RESUMEN

Genome-wide, imputed, sequence, and structural data are now available for exceedingly large sample sizes. The needs for data management, handling population structure and related samples, and performing associations have largely been met. However, the infrastructure to support analyses involving complexity beyond genome-wide association studies is not standardized or centralized. We provide the PLatform for the Analysis, Translation, and Organization of large-scale data (PLATO), a software tool equipped to handle multi-omic data for hundreds of thousands of samples to explore complexity using genetic interactions, environment-wide association studies and gene-environment interactions, phenome-wide association studies, as well as copy number and rare variant analyses. Using the data from the Marshfield Personalized Medicine Research Project, a site in the electronic Medical Records and Genomics Network, we apply each feature of PLATO to type 2 diabetes and demonstrate how PLATO can be used to uncover the complex etiology of common traits.


Asunto(s)
Biología Computacional , Genoma Humano , Estudio de Asociación del Genoma Completo , Consumo de Bebidas Alcohólicas , Alelos , Bases de Datos Genéticas , Diabetes Mellitus Tipo 2/genética , Dieta , Epistasis Genética , Eliminación de Gen , Dosificación de Gen , Interacción Gen-Ambiente , Genómica , Genotipo , Glutamato Descarboxilasa/genética , Humanos , Modelos Genéticos , Fenotipo , Polimorfismo de Nucleótido Simple , Lenguajes de Programación , Recurrencia , Análisis de Secuencia de ADN , Programas Informáticos , Encuestas y Cuestionarios
5.
J Am Med Inform Assoc ; 24(3): 577-587, 2017 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-28040685

RESUMEN

It is common that cancer patients have different molecular signatures even though they have similar clinical features, such as histology, due to the heterogeneity of tumors. To overcome this variability, we previously developed a new approach incorporating prior biological knowledge that identifies knowledge-driven genomic interactions associated with outcomes of interest. However, no systematic approach has been proposed to identify interaction models between pathways based on multi-omics data. Here we have proposed such a novel methodological framework, called metadimensional knowledge-driven genomic interactions (MKGIs). To test the utility of the proposed framework, we applied it to an ovarian cancer dataset including multi-omics profiles from The Cancer Genome Atlas to predict grade, stage, and survival outcome. We found that each knowledge-driven genomic interaction model, based on different genomic datasets, contains different sets of pathway features, which suggests that each genomic data type may contribute to outcomes in ovarian cancer via a different pathway. In addition, MKGI models significantly outperformed the single knowledge-driven genomic interaction model. From the MKGI models, many interactions between pathways associated with outcomes were found, including the mitogen-activated protein kinase (MAPK) signaling pathway and the gonadotropin-releasing hormone (GnRH) signaling pathway, which are known to play important roles in cancer pathogenesis. The beauty of incorporating biological knowledge into the model based on multi-omics data is the ability to improve diagnosis and prognosis and provide better interpretability. Thus, determining variability in molecular signatures based on these interactions between pathways may lead to better diagnostic/treatment strategies for better precision medicine.


Asunto(s)
Genómica/métodos , Modelos Genéticos , Neoplasias Ováricas/genética , Adulto , Anciano , Anciano de 80 o más Años , Conjuntos de Datos como Asunto , Femenino , Expresión Génica , Humanos , Persona de Mediana Edad , Neoplasias Ováricas/diagnóstico , Pronóstico
6.
BioData Min ; 9: 18, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27168765

RESUMEN

BACKGROUND: The future of medicine is moving towards the phase of precision medicine, with the goal to prevent and treat diseases by taking inter-individual variability into account. A large part of the variability lies in our genetic makeup. With the fast paced improvement of high-throughput methods for genome sequencing, a tremendous amount of genetics data have already been generated. The next hurdle for precision medicine is to have sufficient computational tools for analyzing large sets of data. Genome-Wide Association Studies (GWAS) have been the primary method to assess the relationship between single nucleotide polymorphisms (SNPs) and disease traits. While GWAS is sufficient in finding individual SNPs with strong main effects, it does not capture potential interactions among multiple SNPs. In many traits, a large proportion of variation remain unexplained by using main effects alone, leaving the door open for exploring the role of genetic interactions. However, identifying genetic interactions in large-scale genomics data poses a challenge even for modern computing. RESULTS: For this study, we present a new algorithm, Grammatical Evolution Bayesian Network (GEBN) that utilizes Bayesian Networks to identify interactions in the data, and at the same time, uses an evolutionary algorithm to reduce the computational cost associated with network optimization. GEBN excelled in simulation studies where the data contained main effects and interaction effects. We also applied GEBN to a Type 2 diabetes (T2D) dataset obtained from the Marshfield Personalized Medicine Research Project (PMRP). We were able to identify genetic interactions for T2D cases and controls and use information from those interactions to classify T2D samples. We obtained an average testing area under the curve (AUC) of 86.8 %. We also identified several interacting genes such as INADL and LPP that are known to be associated with T2D. CONCLUSIONS: Developing the computational tools to explore genetic associations beyond main effects remains a critically important challenge in human genetics. Methods, such as GEBN, demonstrate the utility of considering genetic interactions, as they likely explain some of the missing heritability.

7.
Neurobiol Aging ; 38: 141-150, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26827652

RESUMEN

Late-onset Alzheimer disease (AD) has a complex genetic etiology, involving locus heterogeneity, polygenic inheritance, and gene-gene interactions; however, the investigation of interactions in recent genome-wide association studies has been limited. We used a biological knowledge-driven approach to evaluate gene-gene interactions for consistency across 13 data sets from the Alzheimer Disease Genetics Consortium. Fifteen single nucleotide polymorphism (SNP)-SNP pairs within 3 gene-gene combinations were identified: SIRT1 × ABCB1, PSAP × PEBP4, and GRIN2B × ADRA1A. In addition, we extend a previously identified interaction from an endophenotype analysis between RYR3 × CACNA1C. Finally, post hoc gene expression analyses of the implicated SNPs further implicate SIRT1 and ABCB1, and implicate CDH23 which was most recently identified as an AD risk locus in an epigenetic analysis of AD. The observed interactions in this article highlight ways in which genotypic variation related to disease may depend on the genetic context in which it occurs. Further, our results highlight the utility of evaluating genetic interactions to explain additional variance in AD risk and identify novel molecular mechanisms of AD pathogenesis.


Asunto(s)
Enfermedad de Alzheimer/genética , Conjuntos de Datos como Asunto , Epistasis Genética/genética , Estudios de Asociación Genética , Subfamilia B de Transportador de Casetes de Unión a ATP/genética , Proteínas Relacionadas con las Cadherinas , Cadherinas/genética , Canales de Calcio Tipo L/genética , Progresión de la Enfermedad , Femenino , Humanos , Masculino , Modelos Genéticos , Proteínas de Unión a Fosfatidiletanolamina/genética , Polimorfismo de Nucleótido Simple , Receptores Adrenérgicos alfa 1/genética , Receptores de N-Metil-D-Aspartato/genética , Riesgo , Canal Liberador de Calcio Receptor de Rianodina/genética , Saposinas/genética , Sirtuina 1/genética
8.
J Biomed Inform ; 56: 220-8, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26048077

RESUMEN

Evaluation of survival models to predict cancer patient prognosis is one of the most important areas of emphasis in cancer research. A binary classification approach has difficulty directly predicting survival due to the characteristics of censored observations and the fact that the predictive power depends on the threshold used to set two classes. In contrast, the traditional Cox regression approach has some drawbacks in the sense that it does not allow for the identification of interactions between genomic features, which could have key roles associated with cancer prognosis. In addition, data integration is regarded as one of the important issues in improving the predictive power of survival models since cancer could be caused by multiple alterations through meta-dimensional genomic data including genome, epigenome, transcriptome, and proteome. Here we have proposed a new integrative framework designed to perform these three functions simultaneously: (1) predicting censored survival data; (2) integrating meta-dimensional omics data; (3) identifying interactions within/between meta-dimensional genomic features associated with survival. In order to predict censored survival time, martingale residuals were calculated as a new continuous outcome and a new fitness function used by the grammatical evolution neural network (GENN) based on mean absolute difference of martingale residuals was implemented. To test the utility of the proposed framework, a simulation study was conducted, followed by an analysis of meta-dimensional omics data including copy number, gene expression, DNA methylation, and protein expression data in breast cancer retrieved from The Cancer Genome Atlas (TCGA). On the basis of the results from breast cancer dataset, we were able to identify interactions not only within a single dimension of genomic data but also between meta-dimensional omics data that are associated with survival. Notably, the predictive power of our best meta-dimensional model was 73% which outperformed all of the other models conducted based on a single dimension of genomic data. Breast cancer is an extremely heterogeneous disease and the high levels of genomic diversity within/between breast tumors could affect the risk of therapeutic responses and disease progression. Thus, identifying interactions within/between meta-dimensional omics data associated with survival in breast cancer is expected to deliver direction for improved meta-dimensional prognostic biomarkers and therapeutic targets.


Asunto(s)
Neoplasias de la Mama/mortalidad , Recolección de Datos , Informática Médica/métodos , Análisis de Supervivencia , Algoritmos , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Biología Computacional/métodos , Simulación por Computador , Metilación de ADN , Progresión de la Enfermedad , Epigenómica , Femenino , Perfilación de la Expresión Génica , Genoma Humano , Genómica , Humanos , Modelos Estadísticos , Redes Neurales de la Computación , Pronóstico , Modelos de Riesgos Proporcionales , Proteoma , Programas Informáticos , Transcriptoma , Resultado del Tratamiento
9.
Pac Symp Biocomput ; : 495-505, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25741542

RESUMEN

Investigating the association between biobank derived genomic data and the information of linked electronic health records (EHRs) is an emerging area of research for dissecting the architecture of complex human traits, where cases and controls for study are defined through the use of electronic phenotyping algorithms deployed in large EHR systems. For our study, cataract cases and controls were identified within the Marshfield Personalized Medicine Research Project (PMRP) biobank and linked EHR, which is a member of the NHGRI-funded electronic Medical Records and Genomics (eMERGE) Network. Our goal was to explore potential gene-gene and gene-environment interactions within these data for 527,953 and 527,936 single nucleotide polymorphisms (SNPs) for gene-gene and gene-environment analyses, respectively, with minor allele frequency > 1%, in order to explore higher level associations with cataract risk beyond investigations of single SNP-phenotype associations. To build our SNP-SNP interaction models we utilized a prior-knowledge driven filtering method called Biofilter to minimize the multiple testing burden of exploring the vast array of interaction models possible from our extensive number of SNPs. Using Biofilter, we developed 57,376 prior-knowledge directed SNP-SNP models to test for association with cataract status. We selected models that required 6 sources of external domain knowledge. We identified 13 statistically significant SNP-SNP models with an interaction with p-value < 1 × 10(-4), as well as an overall model with p-value < 0.01 associated with cataract status. We also conducted gene-environment interaction analyses for all GWAS SNPs and a set of environmental factors from the PhenX Toolkit: smoking, UV exposure, and alcohol use;these environmental factors have been previously associated with the formation of cataracts. We found a total of 782 gene-environment models that exhibit an interaction with a p-value < 1 × 10(-4) associatedwith cataract status. Our results show these approaches enable advanced searches for epistasis and gene-environment interactions beyond GWAS, and that the EHR based approach provides an additional source of data for seeking these advanced explanatory models of the etiology of complex disease/outcome such as cataracts.


Asunto(s)
Catarata/genética , Algoritmos , Bancos de Muestras Biológicas , Estudios de Casos y Controles , Biología Computacional , Bases de Datos Genéticas , Registros Electrónicos de Salud , Epistasis Genética , Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple , Programas Informáticos
10.
Pac Symp Biocomput ; : 96-107, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25592572

RESUMEN

Enormous efforts of whole exome and genome sequencing from hundreds to thousands of patients have provided the landscape of somatic genomic alterations in many cancer types to distinguish between driver mutations and passenger mutations. Driver mutations show strong associations with cancer clinical outcomes such as survival. However, due to the heterogeneity of tumors, somatic mutation profiles are exceptionally sparse whereas other types of genomic data such as miRNA or gene expression contain much more complete data for all genomic features with quantitative values measured in each patient. To overcome the extreme sparseness of somatic mutation profiles and allow for the discovery of combinations of somatic mutations that may predict cancer clinical outcomes, here we propose a new approach for binning somatic mutations based on existing biological knowledge. Through the analysis using renal cell carcinoma dataset from The Cancer Genome Atlas (TCGA), we identified combinations of somatic mutation burden based on pathways, protein families, evolutionary conversed regions, and regulatory regions associated with survival. Due to the nature of heterogeneity in cancer, using a binning strategy for somatic mutation profiles based on biological knowledge will be valuable for improved prognostic biomarkers and potentially for tailoring therapeutic strategies by identifying combinations of driver mutations.


Asunto(s)
Carcinoma de Células Renales/genética , Neoplasias Renales/genética , Mutación , Biomarcadores de Tumor/genética , Carcinoma de Células Renales/mortalidad , Biología Computacional , Bases de Datos Genéticas , Humanos , Neoplasias Renales/mortalidad , Modelos Genéticos , Redes Neurales de la Computación , Pronóstico , Análisis de Supervivencia
11.
BioData Min ; 7: 20, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25214892

RESUMEN

BACKGROUND: Effective cancer clinical outcome prediction for understanding of the mechanism of various types of cancer has been pursued using molecular-based data such as gene expression profiles, an approach that has promise for providing better diagnostics and supporting further therapies. However, clinical outcome prediction based on gene expression profiles varies between independent data sets. Further, single-gene expression outcome prediction is limited for cancer evaluation since genes do not act in isolation, but rather interact with other genes in complex signaling or regulatory networks. In addition, since pathways are more likely to co-operate together, it would be desirable to incorporate expert knowledge to combine pathways in a useful and informative manner. METHODS: Thus, we propose a novel approach for identifying knowledge-driven genomic interactions and applying it to discover models associated with cancer clinical phenotypes using grammatical evolution neural networks (GENN). In order to demonstrate the utility of the proposed approach, an ovarian cancer data from the Cancer Genome Atlas (TCGA) was used for predicting clinical stage as a pilot project. RESULTS: We identified knowledge-driven genomic interactions associated with cancer stage from single knowledge bases such as sources of pathway-pathway interaction, but also knowledge-driven genomic interactions across different sets of knowledge bases such as pathway-protein family interactions by integrating different types of information. Notably, an integration model from different sources of biological knowledge achieved 78.82% balanced accuracy and outperformed the top models with gene expression or single knowledge-based data types alone. Furthermore, the results from the models are more interpretable because they are framed in the context of specific biological pathways or other expert knowledge. CONCLUSIONS: The success of the pilot study we have presented herein will allow us to pursue further identification of models predictive of clinical cancer survival and recurrence. Understanding the underlying tumorigenesis and progression in ovarian cancer through the global view of interactions within/between different biological knowledge sources has the potential for providing more effective screening strategies and therapeutic targets for many types of cancer.

12.
Pac Symp Biocomput ; : 200-11, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24297547

RESUMEN

Environment-wide association studies (EWAS) provide a way to uncover the environmental mechanisms involved in complex traits in a high-throughput manner. Genome-wide association studies have led to the discovery of genetic variants associated with many common diseases but do not take into account the environmental component of complex phenotypes. This EWAS assesses the comprehensive association between environmental variables and the outcome of type 2 diabetes (T2D) in the Marshfield Personalized Medicine Research Project Biobank (Marshfield PMRP). We sought replication in two National Health and Nutrition Examination Surveys (NHANES). The Marshfield PMRP currently uses four tools for measuring environmental exposures and outcome traits: 1) the PhenX Toolkit includes standardized exposure and phenotypic measures across several domains, 2) the Diet History Questionnaire (DHQ) is a food frequency questionnaire, 3) the Measurement of a Person's Habitual Physical Activity scores the level of an individual's physical activity, and 4) electronic health records (EHR) employs validated algorithms to establish T2D case-control status. Using PLATO software, 314 environmental variables were tested for association with T2D using logistic regression, adjusting for sex, age, and BMI in over 2,200 European Americans. When available, similar variables were tested with the same methods and adjustment in samples from NHANES III and NHANES 1999-2002. Twelve and 31 associations were identified in the Marshfield samples at p<0.01 and p<0.05, respectively. Seven and 13 measures replicated in at least one of the NHANES at p<0.01 and p<0.05, respectively, with the same direction of effect. The most significant environmental exposures associated with T2D status included decreased alcohol use as well as increased smoking exposure in childhood and adulthood. The results demonstrate the utility of the EWAS method and survey tools for identifying environmental components of complex diseases like type 2 diabetes. These high-throughput and comprehensive investigation methods can easily be applied to investigate the relation between environmental exposures and multiple phenotypes in future analyses.


Asunto(s)
Diabetes Mellitus Tipo 2/etiología , Ambiente , Bancos de Muestras Biológicas , Biología Computacional , Registros de Dieta , Exposición a Riesgos Ambientales , Femenino , Interacción Gen-Ambiente , Humanos , Masculino , Actividad Motora , Encuestas Nutricionales , Fenotipo , Medicina de Precisión , Programas Informáticos , Wisconsin
13.
Bioinformatics ; 30(5): 698-705, 2014 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-24149050

RESUMEN

MOTIVATION: Advancements in high-throughput technology have allowed researchers to examine the genetic etiology of complex human traits in a robust fashion. Although genome-wide association studies have identified many novel variants associated with hundreds of traits, a large proportion of the estimated trait heritability remains unexplained. One hypothesis is that the commonly used statistical techniques and study designs are not robust to the complex etiology that may underlie these human traits. This etiology could include non-linear gene × gene or gene × environment interactions. Additionally, other levels of biological regulation may play a large role in trait variability. RESULTS: To address the need for computational tools that can explore enormous datasets to detect complex susceptibility models, we have developed a software package called the Analysis Tool for Heritable and Environmental Network Associations (ATHENA). ATHENA combines various variable filtering methods with machine learning techniques to analyze high-throughput categorical (i.e. single nucleotide polymorphisms) and quantitative (i.e. gene expression levels) predictor variables to generate multivariable models that predict either a categorical (i.e. disease status) or quantitative (i.e. cholesterol levels) outcomes. The goal of this article is to demonstrate the utility of ATHENA using simulated and biological datasets that consist of both single nucleotide polymorphisms and gene expression variables to identify complex prediction models. Importantly, this method is flexible and can be expanded to include other types of high-throughput data (i.e. RNA-seq data and biomarker measurements). AVAILABILITY: ATHENA is freely available for download. The software, user manual and tutorial can be downloaded from http://ritchielab.psu.edu/ritchielab/software.


Asunto(s)
Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo , Programas Informáticos , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple
14.
BioData Min ; 6(1): 23, 2013 Dec 20.
Artículo en Inglés | MEDLINE | ID: mdl-24359638

RESUMEN

BACKGROUND: Gene expression profiles have been broadly used in cancer research as a diagnostic or prognostic signature for the clinical outcome prediction such as stage, grade, metastatic status, recurrence, and patient survival, as well as to potentially improve patient management. However, emerging evidence shows that gene expression-based prediction varies between independent data sets. One possible explanation of this effect is that previous studies were focused on identifying genes with large main effects associated with clinical outcomes. Thus, non-linear interactions without large individual main effects would be missed. The other possible explanation is that gene expression as a single level of genomic data is insufficient to explain the clinical outcomes of interest since cancer can be dysregulated by multiple alterations through genome, epigenome, transcriptome, and proteome levels. In order to overcome the variability of diagnostic or prognostic predictors from gene expression alone and to increase its predictive power, we need to integrate multi-levels of genomic data and identify interactions between them associated with clinical outcomes. RESULTS: Here, we proposed an integrative framework for identifying interactions within/between multi-levels of genomic data associated with cancer clinical outcomes using the Grammatical Evolution Neural Networks (GENN). In order to demonstrate the validity of the proposed framework, ovarian cancer data from TCGA was used as a pilot task. We found not only interactions within a single genomic level but also interactions between multi-levels of genomic data associated with survival in ovarian cancer. Notably, the integration model from different levels of genomic data achieved 72.89% balanced accuracy and outperformed the top models with any single level of genomic data. CONCLUSIONS: Understanding the underlying tumorigenesis and progression in ovarian cancer through the global view of interactions within/between different levels of genomic data is expected to provide guidance for improved prognostic biomarkers and individualized therapies.

15.
Pac Symp Biocomput ; : 147-58, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23424120

RESUMEN

Investigating the association between biobank derived genomic data and the information of linked electronic health records (EHRs) is an emerging area of research for dissecting the architecture of complex human traits, where cases and controls for study are defined through the use of electronic phenotyping algorithms deployed in large EHR systems. For our study, 2580 cataract cases and 1367 controls were identified within the Marshfield Personalized Medicine Research Project (PMRP) Biobank and linked EHR, which is a member of the NHGRI-funded electronic Medical Records and Genomics (eMERGE) Network. Our goal was to explore potential gene-gene and gene-environment interactions within these data for 529,431 single nucleotide polymorphisms (SNPs) with minor allele frequency > 1%, in order to explore higher level associations with cataract risk beyond investigations of single SNP-phenotype associations. To build our SNP-SNP interaction models we utilized a prior-knowledge driven filtering method called Biofilter to minimize the multiple testing burden of exploring the vast array of interaction models possible from our extensive number of SNPs. Using the Biofilter, we developed 57,376 prior-knowledge directed SNP-SNP models to test for association with cataract status. We selected models that required 6 sources of external domain knowledge. We identified 5 statistically significant models with an interaction term with p-value < 0.05, as well as an overall model with p-value < 0.05 associated with cataract status. We also conducted gene-environment interaction analyses for all GWAS SNPs and a set of environmental factors from the PhenX Toolkit: smoking, UV exposure, and alcohol use; these environmental factors have been previously associated with the formation of cataracts. We found a total of 288 models that exhibit an interaction term with a p-value ≤ 1×10(-4) associated with cataract status. Our results show these approaches enable advanced searches for epistasis and gene-environment interactions beyond GWAS, and that the EHR based approach provides an additional source of data for seeking these advanced explanatory models of the etiology of complex disease/outcome such as cataracts.


Asunto(s)
Catarata/etiología , Catarata/genética , Epistasis Genética , Interacción Gen-Ambiente , Anciano , Estudios de Casos y Controles , Biología Computacional , Bases de Datos Genéticas/estadística & datos numéricos , Registros Electrónicos de Salud/estadística & datos numéricos , Femenino , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Humanos , Masculino , Persona de Mediana Edad , Modelos Genéticos , Modelos Estadísticos , Polimorfismo de Nucleótido Simple , Programas Informáticos
16.
Pac Symp Biocomput ; : 385-96, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23424143

RESUMEN

Technology is driving the field of human genetics research with advances in techniques to generate high-throughput data that interrogate various levels of biological regulation. With this massive amount of data comes the important task of using powerful bioinformatics techniques to sift through the noise to find true signals that predict various human traits. A popular analytical method thus far has been the genome-wide association study (GWAS), which assesses the association of single nucleotide polymorphisms (SNPs) with the trait of interest. Unfortunately, GWAS has not been able to explain a substantial proportion of the estimated heritability for most complex traits. Due to the inherently complex nature of biology, this phenomenon could be a factor of the simplistic study design. A more powerful analysis may be a systems biology approach that integrates different types of data, or a meta-dimensional analysis. For this study we used the Analysis Tool for Heritable and Environmental Network Associations (ATHENA) to integrate high-throughput SNPs and gene expression variables (EVs) to predict high-density lipoprotein cholesterol (HDL-C) levels. We generated multivariable models that consisted of SNPs only, EVs only, and SNPs + EVs with testing r-squared values of 0.16, 0.11, and 0.18, respectively. Additionally, using just the SNPs and EVs from the best models, we generated a model with a testing r-squared of 0.32. A linear regression model with the same variables resulted in an adjusted r-squared of 0.23. With this systems biology approach, we were able to integrate different types of high-throughput data to generate meta-dimensional models that are predictive for the HDL-C in our data set. Additionally, our modeling method was able to capture more of the HDL-C variation than a linear regression model that included the same variables.


Asunto(s)
HDL-Colesterol/sangre , HDL-Colesterol/genética , Programas Informáticos , Algoritmos , Biología Computacional , Bases de Datos Genéticas/estadística & datos numéricos , Expresión Génica , Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Genotipo , Proyecto Mapa de Haplotipos , Ensayos Analíticos de Alto Rendimiento/estadística & datos numéricos , Humanos , Metaanálisis como Asunto , Modelos Genéticos , Redes Neurales de la Computación , Polimorfismo de Nucleótido Simple , Biología de Sistemas
17.
BioData Min ; 5(1): 5, 2012 Jun 08.
Artículo en Inglés | MEDLINE | ID: mdl-22682510

RESUMEN

BACKGROUND: Phenome-Wide Association Studies (PheWAS) can be used to investigate the association between single nucleotide polymorphisms (SNPs) and a wide spectrum of phenotypes. This is a complementary approach to Genome Wide Association studies (GWAS) that calculate the association between hundreds of thousands of SNPs and one or a limited range of phenotypes. The extensive exploration of the association between phenotypic structure and genotypic variation through PheWAS produces a set of complex and comprehensive results. Integral to fully inspecting, analysing, and interpreting PheWAS results is visualization of the data. RESULTS: We have developed the software PheWAS-View for visually integrating PheWAS results, including information about the SNPs, relevant genes, phenotypes, and the interrelationships between phenotypes, that exist in PheWAS. As a result both the fine grain detail as well as the larger trends that exist within PheWAS results can be elucidated. CONCLUSIONS: PheWAS can be used to discover novel relationships between SNPs, phenotypes, and networks of interrelated phenotypes; identify pleiotropy; provide novel mechanistic insights; and foster hypothesis generation - and these results can be both explored and presented with PheWAS-View. PheWAS-View is freely available for non-commercial research institutions, for full details see http://ritchielab.psu.edu/ritchielab/software.

18.
Ann Hum Genet ; 75(1): 78-89, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21158747

RESUMEN

Analyzing the combined effects of genes and/or environmental factors on the development of complex diseases is a great challenge from both the statistical and computational perspective, even using a relatively small number of genetic and nongenetic exposures. Several data-mining methods have been proposed for interaction analysis, among them, the Multifactor Dimensionality Reduction Method (MDR) has proven its utility in a variety of theoretical and practical settings. Model-Based Multifactor Dimensionality Reduction (MB-MDR), a relatively new MDR-based technique that is able to unify the best of both nonparametric and parametric worlds, was developed to address some of the remaining concerns that go along with an MDR analysis. These include the restriction to univariate, dichotomous traits, the absence of flexible ways to adjust for lower order effects and important confounders, and the difficulty in highlighting epistatic effects when too many multilocus genotype cells are pooled into two new genotype groups. We investigate the empirical power of MB-MDR to detect gene-gene interactions in the absence of any noise and in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Power is generally higher for MB-MDR than for MDR, in particular in the presence of genetic heterogeneity, phenocopy, or low minor allele frequencies.


Asunto(s)
Enfermedad/genética , Epistasis Genética , Modelos Genéticos , Reducción de Dimensionalidad Multifactorial , Estudios de Casos y Controles , Simulación por Computador
19.
Pac Symp Biocomput ; : 265-75, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21121054

RESUMEN

In this paper, we describe using Synthesis-View, a new method of presenting complex genetic data, to revisit results of a study from the BioVU Vanderbilt DNA databank. BioVU is a biorepository of DNA samples coupled with de-identified electronic medical records (EMR). In the Ritchie et al. study ~10,000 BioVU samples were genotyped for 21 SNPs that were previously associated with 5 diseases: atrial fibrillation, Crohn Disease, multiple sclerosis, rheumatoid arthritis, and type 2 diabetes. In the proof-of-concept study, the 21 tests of association replicated previous findings where sample size provided adequate power. The majority of the BioVU results were originally presented in tabular form. Herein we have revisited the results of this study using Synthesis-View. The Synthesis-View software tool visually synthesizes the results of complex, multi-layered studies that aim to characterize associations between small numbers of single-nucleotide polymorphisms (SNPs) and diseases and/or phenotypes, such as the results of replication and meta-analysis studies. Using Synthesis-View with the data of the Ritchie et al. study and presenting these data in this integrated visual format demonstrates new ways to investigate and interpret these kinds of data. Synthesis-View is freely available for non-commercial research institutions, for full details see https://chgr.mc.vanderbilt.edu/synthesisview.


Asunto(s)
Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Programas Informáticos , Algoritmos , Biología Computacional , Gráficos por Computador , Enfermedad/genética , Estudios de Asociación Genética/estadística & datos numéricos , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Humanos , Polimorfismo de Nucleótido Simple
20.
Genet Evol Comput Conf ; 12: 203-210, 2010.
Artículo en Inglés | MEDLINE | ID: mdl-21152364

RESUMEN

Recent advances in genotyping technology have led to the generation of an enormous quantity of genetic data. Traditional methods of statistical analysis have proved insufficient in extracting all of the information about the genetic components of common, complex human diseases. A contributing factor to the problem of analysis is that amongst the small main effects of each single gene on disease susceptibility, there are non-linear, gene-gene interactions that can be difficult for traditional, parametric analyses to detect. In addition, exhaustively searching all multi-locus combinations has proved computationally impractical. Novel strategies for analysis have been developed to address these issues. The Analysis Tool for Heritable and Environmental Network Associations (ATHENA) is an analytical tool that incorporates grammatical evolution neural networks (GENN) to detect interactions among genetic factors. Initial parameters define how the evolutionary process will be implemented. This research addresses how different parameter settings affect detection of disease models involving interactions. In the current study, we iterate over multiple parameter values to determine which combinations appear optimal for detecting interactions in simulated data for multiple genetic models. Our results indicate that the factors that have the greatest influence on detection are: input variable encoding, population size, and parallel computation.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...